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CLASSICAL INFORMATION STORAGE IN AN 

n-LEVEL QUANTUM SYSTEM 
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Dedicated to Professor Andor Frenkel on the occasion of his 80"' birthday 

Abstract. A game is played by a team of two — say Alice and 

CO . Bob — in which the value of a random variable x is revealed to 

Alice only, who cannot freely communicate with Bob. Instead, 
she is given a quantum n-level system, respectively a classical n- 
state system, which she can put in possession of Bob in any state 
O . she wishes. We evaluate how successfully they managed to store 

.^ ' and recover the value of x in the used system by requiring Bob to 

specify a value z and giving a reward of value f[x, z) to the team. 
We show that whatever the probability distribution of x and 
the reward function / are, when using a quantum n-level system, 
the maximum expected reward obtainable with the best possible 
team strategy is equal to that obtainable with the use of a classical 
n-state system. 

The proof relies on mixed discriminants of positive matrices and 
— perhaps surprisingly — an application of the Supply-Demand 
Theorem for bipartite graphs. As a corollary, we get an infinite 
set of new, dimension dependent inequalities regarding positive 

rf-\ . operator valued measures and density operators on complex n- 

CN ' space. 

I> 

in 

^' 

O ■ 1. Introduction 

m _ 

In contrast to a classical bit which has only 2 pure states, a qubit 

has infinitely many. However, this does not neccessarily mean that we 

^ \ can store more (classical) information in a qubit than in a classical 

H ■ bit. The point is that although the qubit has infinitely many different 

- - - pure states, it is impossible to distinguish these states with certainty. 

This is a fundamental fact, and cannot be circumvented by some better 

measuring device. 

In the case of a qubit, one can distinguish with certainty between at 

most 2 states. In general, in the case of an n-level system (whose state 

space is modelled by the set of density operators of an n-dimensional 

complex Hilbert space), one can distinguish with certainty between at 

most n states. So in this respect a quantum n-level system performs 
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like a classical n-state system. However, this does not make them 
necessarily equivalent. Perhaps it is possible to distinguish between 
k > n states of a quantum n-level system not with certainty, but in a 
way that is — in some sense — "closer" to certainty than what we can 
achieve with a classical n-state system. 

How to define "closer to certainty"? In general, we may view our qubit 
as a memory — or as is often done in the literature: as a channeU — in 
which there is an ingoing and an outgoing information: Alice chooses a 
certain state from a previously fixed set of states and puts the system 
into the selected state, then passes it to Bob, who will have to try 
to figure out the chosen state. So one may investigate the issue from 
the point of view of some kind of (classical) channel capacity. One 
idea is of course to use Shannon-type informational capacity which 
is well-investigated in the quantum setting; see e.g. [llj on quantum 
information theory and [9j on quantum entropy. 

However, here we argue that this in itself cannot fully settle the 
problem. As an example, suppose that the following game is played. 

A $1 bill is put randomly and with equal probability into one of 3 
boxes. Bob will pick one of the boxes and he will get what is inside 
that box. Alice knows where the $1 bill has been put, and she wishes 
to help Bob. However, Alice is not allowed to directly tell Bob where 
the money is (in which case Bob could always get the $1 bill with 
certainty). Instead, she is only allowed to send to Bob a classical bit, 
respectively a qubit (not previously entangled to anything else) whose 
state she can manipulate as she wishes. That is, she is allowed to send 
a classical or a quantum bit of information. 

They may agree on some scheme beforehand. For example, played 
with a classical bit, Alice and Bob can agree that the bit- value will 
mean that the money is in box nr. 1 (in which case Bob will pick box 
nr. 1) and the bit- value 1 will mean that the money is either in box nr. 
2 or in box nr. 3 (in which case Bob will toss a coin and accordingly 
pick box nr. 2 or 3). The question then becomes: after the game is 
played once, what is the expected value of the money won? 

Every team-strategy leads to a specific channel matrix, i.e. a collec- 
tion of conditional probabilities of the type aij where 

Uij = P(Bob chooses the z* box given that the money is in the j* box) 



Usually by specifying a channel one means to fix a collection of states corre- 
sponding to encoding, while the measurement on the decoding side remains unspec- 
ified. For us neither encoding nor decoding is fixed as both are to be optimalized; 
only the level n of the system (i.e. the dimension of the Hilbert space) is fixed. For 
this reason we prefer to talk about a memory unit rather than a channel. 
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SO that for example the previously described simple strategy using a 
classical bit gives the channel matrix 
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In general, the channel matrix A will be a stochastic matrix: its entries 
are all nonnegative, and each column sums to 1. The above channel 
matrix will allow Bob to have an expected win of 2/3 dollars, so we 
may say that its "money capacity" is c$ = 2/3. It is an elementary 
exercise to show that this is the best we can get using a classical bit. 

Note that in general c%{A) = |tr(A) whereas the "usual" (infor- 
mational) capacity c{M) would be the maximum mutual inform,ation 
between Bob's choice and the actual place of the money; for the above 
channel matrix c is precisely 1 bit. Now, by Holevo's celebrated theo- 
rem [5] , even if Alice and Bob used a quantum bit instead of a classical 
one, the informational capacity c could not get above 1 bit. This is in 
accordance with the (common) belief that a single qubit (on its own) 
is worth no more than a classical one. 

The fact that in superdense coding [3j Alice manages to transmit 2 
bits of information to Bob by physically sending only 1 qubit is no 
contradiction to what was said. Indeed, in superdense coding 2 qubits 
are used: for the decoding part both of them are neccessary. However, 
for the encoding part only one of them is needed; this is achieved by 
previously entangling the 2 qubits. So the 2 classical bits are actually 
stored in 2 quantum bits; the surprising feature is rather that it is a 
kind of 2-bit memory made out of 2 qubits where for the "read out" we 
need both, but for the "write in" we only need to get in touch with one 
of them. 

However, here we are interested by how much (if any) better is a 
qubit on its own (not entangled to other qubits) than a classical one. 
So can we apply Holevo's theorem to conclude that in the described 
game, even by using a qubit, Alice and Bob cannot win more than 2/3 
of a dollar (i.e. the maximum amount possible when a classical bit is 
used)? The answer is negative. In fact, consider the stochastic matrix 

/3/4 1/8 1/^ 
A= 1/8 3/4 1/8 
Vl/8 1/8 3/4^ 

Its "money capacity" c${A) = |try4 = | is larger than what can be 
achieved by using a classical bit. However, elementary arguments to- 
gether with a straightforward computation show that c{A) = |log(3) — 
|log(2), which is sma//er than log(2), i.e., smaller than one bit. This 
should not be considered a surprise: in Shannon's Noisy Channel Cod- 
ing Theorem, the channel capacity as reliable transmission rate is only 
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achieved in the "long run"; with a single use of the channel, things can 
go different. 

Thus, as shown by the previous example, Holevo's theorem cannot 
rule out the existence of a strategy by which using a qubit Alice and 
Bob can win more money in this game (in terms of expected values) 
than what is possible using a classical bit. 

Nevertheless, for this actual game it is not difficult to come up with 
an argument [8] to show that c$ < 2/3 holds even if Alice and Bob 
uses a qubit. However, why sticking exactly to this game, and why 
investigating only the case of 2-level systems (i.e. single bits)? 

In general we might consider the following scheme. The value of a 
random variable x is revealed to Alice but not to Bob. Though previous 
to the game they can meet and agree to follow any kind of strategy 
they like, during the game Alice cannot freely communicate with Bob. 
Instead, she is given a quantum n-level system (or alternatively: a 
classical n-state system) which she can put in possession of Bob in any 
state she wishes. Bob is then required to specify a value z. We evaluate 
how successfully they managed to store and recover the value of x by 
giving a reward of value /(x, z) to the team, where / is some previously 
fixed "reward function" (e.g. in our original game the reward was $1 if 
X = z and zero otherwise). 

Consider the sets C„(A;, /) and Qn{k, I) defined as convex hulls of all 
possible k x / channel matrices that can be obtained by Alice passing 
to Bob a classical n-state system or a quantum n-level system, respec- 
tively. We postpone a more detailed description of these sets to the 
next section, but note that obviously Cn{k,l) C Qn{k,l). 

Since we think of the distribution of x as fixed with the rules of the 
game, the expected reward E(/(a;, z)) is just an arbitrary affine linear 
functional of the channel matrix of Alice and Bob (e.g. in our original 
game it was 1/3 times the trace). It follows that there would exist a 
game of the specified type in which it is more efficient to use a quantum 
n-level system than a classical one if and only if Cn{k,l) ^ Qn{k,l)- 
However, our main result is that Cn{k,l) = Qn{k,l) for all values of 
n, k, I. 

For n > min(A;, /) the above equality is trivial as both sets consist of 
all stochastic k x / matrices, but in general it is nontrivial in the sense 
that it results in some new, dimension dependent inequalities regarding 
positive operator valued measures and density matrices. 

Notations and terminology. The set {1, . . . ,k} is denoted by [k]. 
We write Cjj for the matrix that has an entry 1 at position {i,j) and all 
other entries zero. The identity matrix is 1. A matrix is stochastic if all 
entries are nonnegative reals and each column sums to 1. A complex 
matrix A is psdh if it is positive semidefinite Hermitian, written A> 0. 
A positive operator valued measure (POVM), also called a partition of 
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unity, is a sequence Ei, . . . , Ej. of psdh matrices summing to 1. A 
density matrix is a psdh matrix with trace 1. 

2. Sets of channel matrices 

Throughout this section, let n, k and / be fixed positive integers. 

Suppose that a letter of an alphabet containing / letters is to be 
encoded by Alice in a classical ra-state system, whereas on the decoding 
side Bob uses an alphabet containing k letters. Which channel matrices 
can Alice and Bob realize by a suitable strategy? 

Every strategy is a convex combination of pure strategies; strategies 
— we are in the classical setting — in which no randomness appears. 
Thus in the case of a pure strategy, the channel matrix is a kxl matrix 
such that 

(1) each entry is either 1 or 0, 

(2) in each column there is exactly one 1, 

(3) the number of nonzero rows is at most n. 

This last property is due to the fact that Alice and Bob use a classical 
n-state system: if no randomness is used, then at decoding at most n 
different things can happen, regardless of the number k of letters that 
Bob has in his alphabet. Thus we make the following 

Definition 1. Let C = Cn{k,l) be the convex hull oi k x I matrices 
satisfying properties (1), (2) and (3) above. 

Then C is a convex polytope whose vertices are the k x / matrices 
satisfying (1), (2) and (3). Note that C is also the convex hull oi k x I 
stochastic matrices with at most n nonzero rows. 

Now suppose that instead of a classical n-state system, Alice can use 
an ra-level quantum system. Its state space can be identified with the 
set of complex n x n density matrices: the set of matrices p G M„(C) 
such that 

p > 0, tr(p) = 1. 

A specific measurement scheme with k possible outcomes gives rise to 
an affine map from this state space to the set of classical probability 
distributions on the set [k] = {1, 2, . . . , k}. Such an affine map is always 
given in the following way: we have a positive operator valued measure 
(POVM) Ei,E2,...,Ek e M„(C), i.e. Ei > for each i = l,2,...,k 
and El + E2 + . . . + Ek = 1 (identity matrix), and the map in question 
is 

p ^ (tr(Eip), tr(E2p), . . . tr(Efcp)) , 

see details in |6i Section 1.6]. Thus the most general measurement 
with k outcomes is a POVM Ei, E2, . . . , Ek in the sense that if the 
state of the system was described by the density operator p then the 
measurement will result in the i^^ outcome with probability ii{Eip). 
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Now let US return to the set of possible channel matrices. On the en- 
coding side, Alice needs to choose a state for each letter to be encoded. 
On the decoding side, Bob needs to choose a measurement whose result 
will be interpreted as "read out". Thus a specific strategy is given by 
the choice of / density matrices pi,p2, ■ ■ ■ , Pi £ ^n(C) and a POVM 
Ei,E2, . . . ,Ek G M„(C) resulting in the channel matrix A with entries 



tti 



ti (EiPj). 



The set of matrices A we can obtain depends on n, k and /. We make 
the following 

Definition 2. Let Q = Qn{k,l) be the convex hull oi k x I matrices 
of the form {ti{EiPj)), where Ei,...,Ek G M„(C) is a POVM and 
Pii ■ ■ ■ 1 pi G M„(C) are density matrices. 

It may not be obvious that Q is a polytope, but in fact, our main 
result is 

Theorem Z. C = Q. 

We start by proving the trivial inclusion. 

Proof of C C Q. Assuming A G C, we show that A E Q. We know 
that A is a stochastic matrix, and we may assume that only the first 
min(n, k) rows of A can be nonzero. 

Put pj = ^™!° ciij^ii- These are density matrices. 

When n < k, put Ei = en for i < n and Ei = {) otherwise. When 
n > k, put Ei = en ioT i < k — 1 and Ek = XliLfc ^a- -'■^ either case, 
this is a POVM. We have tT{EiPj) = a^, whence A e Q. D 

For the proof of the reverse inclusion, we recall the definition and the 
positivity property of mixed discriminants. 

The determinant is a homogeneous polynomial of degree n on M„(C). 
Therefore, there exists a unique symmetric n-linear function D such 
that 

D{X,...,X) = detX 

for all X G M„(C). This function D is the mixed discriminant. By 
[H Lemma 2(vi)], if Ei, . . . , En are all positive semidefinite Hermitian 
matrices, then 

D{Ei,...En)>0. 

Proof of Q C C. Assume that A e Q. We prove that A e C. We may 
assume that Ojj = tr{EiPj), where Ei, . . . ,Ek G M„(C) is a POVM and 
pii ■ ■ ■ 1 pi G M„(C) are density matrices. 
For / = (zi,...,z„) G [kY, put 
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where D is the mixed discriminant. We have p/ > for all /. Thus, 
we get a measure P on [A;]" defined by P{S) = 'Ylii^^sPi- Using the 
multilinearity of D and the assumption that E'l, . . . , i^^ is a partition 
of unity, we see that 

P([fc]") =D(l,...,l) = detl = l, 

so P is a probability measure. In fact, for any i? C [/c], we may put 
Er = 'Ylii^R^i-i &^d then we have 

P^R") = det Er. 

Since < Er < 1, all eigenvalues of Er are in [0, 1]. Thus, det Er, the 
product of eigenvalues, does not exceed the smallest eigenvalue. Hence, 
{detEn}! < Er, so, for all j, 

tT{ERPj) > tr((detE^)lp,) = det Er. 

The left hand side here is Aj{R), where Aj is the probability measure 
on [k] given by the j'-th column of A. So we have 

Aj{R) > PiK") for all R<Z[k]. 

Let us connect / G [/c]" to i G [k] by an edge if i occurs in I . This 
gives us a bipartite graph. The neighborhood of any set 5 C [A;]" is 
the set R C \k] of indices occurring in some element of S. We always 
have S C P", whence A^{R) > PiR"") > P{S). Thus, by the Supply- 
Demand Theorem, there exists a probability measure Pj on [/c]" x [k] 
which is supported on the edges of the graph and has marginals P and 
Aj. Whenever pj ^ 0, let B{I) be the k x / stochastic matrix whose 
j-th column is given by the conditional distribution Pj\I on [k]. Then 
B{I) has at most n nonzero rows, and A = J BdP G C. D 

Remark 4. (Infinite channel matrices.) We may replace [/] by an 
arbitrary index set J, and [k] by a Hausdorff topological space X. Let 
M.{X) be the space of Radon probability measures on X, endowed with 
the weak topology. In this setting, the space oi kxl stochastic matrices 
is replaced by A^(X)"^, endowed with the product topology. By the 
convex hull of a set 5 C A^(X)'^, we mean the set of all expectations 
Es, where s : fi — )■ 5 is a random variable on any probability space Q. 
Let Cn{X, J) be the convex hull of [j{M{Ny\N C X, \N\ < n}. Let 
Qn{X, J) be the convex hull of all elements of the form tT{Ep), where 
E : B{X) — )■ M„(C) is a positive operator valued Radon probability 
measure on the Borel sets of X, and p is a map from J to the set of n x n 
density matrices. Then Cn{X, J) = Qn{X, J). The proof is essentially 
the same as that of Theorem [21 and is left to the reader. Note that we 
need a topological version of the Supply-Demand Theorem, namely [3, 
Theorem 3.81. 



p. E. FRENKEL AND M. WEINER 



3. Inequalities for POVM's and density matrices 

As before, let n, k and / be positive integers. If a linear inequality is 
satisfied by the entries of any k x / stochastic 0-1 matrix with at most 
n nonzero rows, then we can use Theorem [3] to deduce that it is also 
satisfied by the entries of any A G Qn{k, I). This is a way to get new 
inequalities for POVM's and density matrices. Therefore, we want to 
find inequalities satisfied by all A G Cn{k, I). 

When n > min(/c, /), the polytope C„(fc, /) is obviously the set of all 
stochastic k x / matrices and we do not get anything interesting. 

In general, we are not able to describe the faces of the polytope 
Cn{k, I). However, it is clear that a k x I real matrix A belongs to the 
polytope if and only if it satisfies all linear inequalities 



(3.i; 



tT{CA)>c {CeW'''',ce 



that the vertices satisfy. The vertices are the stochastic 0-1 matrices A 
with at most n nonzero rows, and all of them satisfy ( 13.1 p if and only 
if for all N C [k], \N\ = n, we have 



(3.2) 



E 



mm Cri 



> C 



for the entries of the matrix C = {cri). For example, if C is a 0-1 matrix 
such that any n columns have at least one 1 at the same position, and 
c = 1, then the inequalities (13. 2p hold, and therefore tT{CA) > 1 for all 
A in the polytope. E.g., let n = 2 and 



(3.3) 



C 



/O 1 1 l\ 
110 
10 10 

yi ly 



A 



fl/2 \ 

1/6 1/2 1/2 

1/6 1/2 1/2 

\l/6 1/2 1/2 J 



then tr{CA) = 1/2, so A is not in the polytope C2(4,4) = ^2(4, 4). 

Now observe that if C is a matrix of size m x k, with arbitrary m, 
such that the inequalities (13.20 hold, then any vertex, and therefore 
any point A of the polytope C satisfies 



(3.4) 



> iam(CA)ri > c, 

r 



which is stronger than (13. ip . Note that (13. 4p can be rewritten as a 
system of F linear inequalities holding simultaneously. Namely, we 
choose a jr for each r and write 



(3.5) 



J2iCAU 



> c. 



As an example, choose n < r < k, let m = (^) , and let the rows of 
C be indexed by the r-element subsets S of [k]. Let Cs^i = 1 if i G S" 
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and zero otherwise. Then the inequahties (13. 2p hold with 

^k — n\ fk — n^ 
^r — nj \k — r 

and therefore (13. 4p also holds, i.e., 

Emin > an > ( , ) 

j€[i] ^-^ ^ - \k-r 

\S\=r ^leS ^ ^ 

for all A in the polytope. Replacing rhy k — r and S by [/c] — S", and 
using that A is stochastic, we get 

(3-6) E-^fE«^.^(!:)-f r) 

|S|=r -^ ^ jes ^ ^ ^ ^ 

for all A in the polytope, whenever < r < A; — n. When r = 1, this is 
immediate for A G Qn{k, I) from the fact that for any density matrix p, 
we have p < 1 and so ii{Ep) < tr E for any psdh matrix E. However, 
when r > 2, the inequalities (13. 6p seem nontrivial for A G Qn{k,l), 
and they only follow from Theorem |3l Note that the inequalities (13. 6 p 
are not sufficient to describe the polytope. Indeed, for k = I = A and 
n = 2, the matrix A in (13. 3p satisfies (13. 6 p for all r, but is not in the 
polytope. 

These examples already show that we have a huge freedom in choos- 
ing C, and the combinatorics of finite families of sets enters into the 
subject of finding dimension dependent linear inequalities for values 
ii{EiPj). This could turn out to be interesting. For example, think 
of the famous question of mutually unbiased bases. A complete set 
of mutually unbiased bases can also be described as a set of density 
operators pi, . . . ,p„(n+i) £ Af„(C) with (tr(pjpj))"^" ^^ being certain 
prescribed values; see e.g. [2] for a good review or [lOj for some re- 
cent development. If ra is a power of a prime, then such systems exist, 
while for other dimensions such systems are believed to not to exist, 
although this has not yet been proved. Thus, to prove nonexistence, 
one will have to use inequalities (or other objects) that show nontrivial 
dependence on the dimension n. 

To close, we mention the slightly related open question of describing 
the set oi k X k matrices X = (tr(Ajy4j)), where k is fixed, n is arbi- 
trary, and Ai, . . . , Ak are psdh n x n matrices. Clearly, X is positive 
semidefinite, with nonnegative real entries. However, not all positive 
semidefinite and entrywise nonnegative matrices can be written in this 
form, even if n is arbitrary; see [4j . 

Acknowledgement. The authors wish to thank Aidan J. Klobuchar 
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and conversations. 
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